Cosmological parameter estimation using Particle Swarm Optimization (PSO) 
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Constraining theoretical models, which are represented by a set of parameters, using observational 
data is an important exercise in cosmology. In Bayesian framework this is done by finding the 
probability distribution of parameters which best fits to the observational data using sampling 
based methods like Markov Chain Monte Carlo (MCMC). It has been argued that MCMC may 
not be the best option in certain problems in which the target function (likelihood) posses local 
maxima or have very high dimensionality. Apart from this, there may be examples in which we are 
mainly interested to find the point in the parameter space at which the probability distribution has 
the largest value. In this situation the problem of parameter estimation becomes an optimization 
problem. In the present work we show that Particle Swarm Optimization (PSO), which is an artificial 
intelligence inspired population based search procedure, can also be used for cosmological parameter 
estimation. Using PSO we were able to recover the best fit LCDM parameters from the WMAP seven 
year data without using any prior guess value or any other property of the probability distribution 
of parameters like standard deviation, as is common in MCMC. We also report the results of an 
exercise in which we consider a binned primordial power spectrum (to increase the dimensionality 
of problem) and find that a power spectrum with features gives lower chi square than the standard 
power law. Since PSO does not sample the likelihood surface in a fair way, we follow a fitting 
procedure to find the spread of likelihood function around the best fit point. 

PACS numbers: 



I. INTRODUCTION 

In a typical CMB data analysis pipeline first the time 
order data, obtained from an instrument like WMAP, is 
reduced into a set of sky maps from which angular power 
spectra are computed, and finally these spectra are re- 
duced into a set of cosmological parameters representing 
a model usually using Bayesian analysis [H-13,13-@- The 
exercise of parameter estimation involves identifying a 
set of parameters which has the highest probability of 
giving the observed data i.e., finding a point in the mul- 
tidimensional parameter space at which the likelihood 
function has the greatest value. Currently this exercise is 
done using some sampling based methods, like MCMC, 
in which the likelihood function is sampled at discrete 
points, which are further used to compute various statis- 
tics of parameters 0, @, Apart from MCMC, non- 
sampling based methods inspired from artificial intelli- 
gence techniques, like artificial neural network, have also 
been successfully applied in cosmological parameter esti- 
mation from the CMB data fill ]. 

In the present work we demonstrate the use of another 
artificial intelligence inspired method, named Particle 
Swarm Optimization or PSO fl2l4l4| , for cosmolo gica l 
parameter estimation using WMAP seven year datafla]. 
Being a stochastic method, PSO also has the interest- 
ing feature that the computational cost for searching the 
global maximum in the multi-dimensional space does not 
grow exponentially with the dimensionality of the search 
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space. However, in this case also (like other stochastic 
methods) the probability of convergence to the global 
maximum is usually guaranteed only in the asymptotic 
limit. Based on a very simple idea and having very few 
design parameters, PSO is quite easy to program and 
can provide accurate results very fast. As compared to 
artificial neural network [ll[ all the calculations in PSO 
are exact, in the sense that no extrapolation or interpola- 
tion is done at any stage i.e., fitness function is computed 
exactly at all points. It has been found that PSO can out- 
perform MCMC in certain situations, in particular when 
there are a large number of local maxima present and /or 
the dimensionality of the search space is very high [16|. 

It has been a common practice to consider a feature- 
less primordial power spectrum (PPS), characterized by 
two parameters tilt (n s ) and amplitude (A s ). In this case 
also there is a degeneracy between the parameters of PPS 
and other cosmological parameters (like ilt,), but the like- 
lihood surface remains fairly smooth and does not poses 
much challenge for the MCMC method. There have been 
studies which show that a PPS with features is a better 
fit to the observational data as compared to the feature- 
less power law PPS H2HI3. 

In one of such studies (l9j a 
PPS with oscillations is considered and it is argued that 
due to additional degrees of freedom, as a result of fea- 
tures, the MCMC approach is not very successful since 
there are a large number of local maxima present in the 
search space and the chains frequently tunnel from one 
local maximum to another. In order to circumvent this 
problem it has been recommended first to carry out the 
search space over a subset of parameters over a grid and 
then use full MCMC. PSO can be quite useful in this 
type of problem due to its better capabilities of handling 
higher dimensional search space and large number of lo- 
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cal maxima. 

Since the fitness function i.e., the function to be opti- 
mized, can be computed concurrently on a parallel plat- 
form for a large number of particles, PSO promises to 
give accurate results very fast, if implemented efficiently. 

Unlike MCMC which provides the full probability dis- 
tribution from which marginalized values of parameters 
and error bars can be computed, PSO just gives the lo- 
cation of the best fit point, called the Gbest, (definition 
in next section) and one needs to find some way to com- 
pute error bars. In the present work we fit the effec- 
tive chi square (xls = — 21og£) by a multi-dimensional 
paraboloid around the best fit point and compute the 
error bars from the fitting coefficients of that. On the 
basis of the errors we set the range for the two dimen- 
sional grids which we consider for various combinations 
of parameters (keeping all other parameters fixed to their 
best fit values) and make contour plots which show not 
only the extent to which the likelihood surface is spread 
(errors) but also the correlation between various param- 
eters. These contour plots also confirm that PSO does 
find the global maximum of the likelihood function. 

We observe that in most cases (which we have consid- 
ered) not only does the Gbest approach towards the best 
fit point, the average location of particles also approaches 
towards it, as PSO progresses. The average location of 
PSO particles in the multi-dimensional search space can 
be used to check the robustness of PSO. Since PSO is 
designed mainly for finding the best fit point, therefore 
the sampling done by PSO particles is not designed to be 
a fair representation of the likelihood surface. 

The plan of the paper is as follows. In §|TT]wc discuss a 
simple implementation of the Particle Swarm Optimiza- 
tion in detail. In particular, we focus on the dynamics 
of particles, setting-up initial conditions, boundary con- 
ditions and the convergence criteria. We also define all 
the design parameters and variables of PSO in § [TTJ A 
very brief overview of parameter estimation in Bayesian 
formalism is discussed in EjTTTJ In place of discussing the 
full prescription of Maximum Likelihood (ML) estima- 
tion, we mainly focus on the computation of error bars 
from covariance or Hessian matrix in this section. We 
present our results of parameter estimation using PSO 
for WMAP seven year data in § HVl Apart from giving 
the best fit parameters and errors which we get from our 
fitting exercise, we also present contour plots for various 
combinations of the parameters. Along with the best 
fit parameter estimates which we get from PSO, we give 
a comparison of the results obtained from PSO and as 
are reported by WMAP team using MCMC. In |jV] we 
summarize the PSO and discuss its advantages and dis- 
advantages. 



II. PARTICLE SWARM OPTIMIZATION 

Formally proposed by James Kennedy and Russell 
Eberhart in 1995 [l2[, PSO has been successfully tested 



and applied in many engineering and artificial intelligence 
problems [20h221 | . Recently it has been applied in as- 
trophysical problems also O US H3- I n PSO, a set of 
"particles" driven by "cognition" and "social" factors ex- 
plore the multi-dimensional search space by carrying out 
random walks, determined by a set of "design parame- 
ters" . The accuracy and performance of the algorithm 
depends on the values of the design parameters as well 
as the function to be optimized i.e., the fitness function. 

Particle Swarm Optimization is based on observations 
of social dynamics, bird flocks, fish schools and other 
forms of group behavior. Personal discoveries made by 
members of the group and shared with everyone else can 
help everyone to become more efficient in making new 
discoveries. Efficient personal search and communication 
with other members of the group can lead to rapid success 
for the group in search of some common goal i.e., food 
etc. 

At present there exists many implementations of the 
Particle Swarm Optimization. Here we consider one of 
the simplest forms [l2j the elements of which are shared 
by many other implementations. Before describing the 
working of our PSO implementation, it is useful to define 
some of the key terms which are used to describe PSO. 

1. Particles:- The term "particles" in PSO is used 
for "computational agents" and has no relation 
with any form of physical particles. PSO particles 
have no mass and occupy no volume (however, they 
can have weights called inertia weights which will 
be discussed later). PSO particles are distinguished 
from each other on the basis of their identification 
numbers or ids and have "positions" and "veloc- 
ities" . In our discussion we represent the position 
and velocity of a particle with id i at step ( "time" ) 
t by vector X l (t) and V*(t) respectively. 

2. Fitness function:- The function F(X) to be used 
for searching the global maximum is called the "fit- 
ness function" or "optimization function" . In the 
present case we use — 21og£ or xZfi as our opti- 
mization function, where C is the CMB likelihood 
function. 

3. Pbest:- The maximum value of the optimization 
function F l (t) for a particle i till the present step 
(time N t ) is called its Pbest 1 . 

Pbest' = Max{.F l (i),i = 0,1,2..., N t } (1) 

the location of the Pbest' is represented by the 
vector P l 

P i = X\t) if F l {t) = Pbest 1 (2) 

4. Gbest:- The largest value of Pbest among all 
particles is called the Gbest. The value of 
Gbest changes only when any of the particles finds 
a new position at which the value of the fitness 
function is larger than any of the earlier values. 

Gbest = Max{Pbest', j = 0, 1, 2..., N p }. (3) 
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Here N p is the number of particles. The location 
of the Gbest is given by the vector G. 



G = X l {t) if Pbest* = Gbest. 



A. Dynamics of PSO particles 



(4) 



The following equation is used to update the positions 
of the particles [12J , 

X i (t+l)=X i (t) + V i (t+l), (5) 

where velocity V l (t + 1) for the particle i at step (t + 1) 
is computed in the following way 

V\t + 1) = wV\t) + ci& {X l {t) - P 1 } 

+c 2 6 {X l (t) - G} . (6) 

Here c\ and C2 are called acceleration coefficients, w is 
called the inertia weight and £1 and £2 are two uniform 
random numbers in the range [0,1]. The values of the 
acceleration coefficients C\ and C2 decide the contribution 
due to personal (cognitive) learning and social learning 
respectively. 

The first factor in the right hand side of equation <j6j) 
moves the particle along a straight line and the second 
and third factors accelerate it towards the location of 
Pbest and Gbest respectively. Kennedy and Eberhart 
[l2| use ci = C2 = 2 to give it a mean of unity, so that the 
particle would overfly the target about half of the time. 

Although the equation © is most commonly used, the 
following version is also in use [23j . 

V*(t + 1) = K{V\t) + Cifi (X l (t) - P l ) 

+C 2 ^(X i (t)-G)} (7) 

where K is called the constriction factor and is defined 
in the following way: 



K = 



(8) 



where <p = C\ + C2. Here the recommended values are 
Ci = C 2 = 2.05 which gives K = 0.729. Equation © 
is equivalent to equation (J6j> with c\ — KC\ , ci = KC<i 
and w = K. Since there are many implementations (with 
different values of design parameters or with some new 
parameters) we have decided to work with PSO standard 
2006 [25| which uses the following values of the design 
parameters in equation © 



1 



21og(2; 



= 0.72 



and 



Cl 



C2 



0.5 + log(2) = 1.193 



(9) 



(10) 



since we were able to get quite accurate results with the 
values of design parameters suggested in PSO standard 



2006 [25J, we decided to adopt these values in our imple- 
mentation. We did try a few other values but could not 
find any significant improvement. 

In particle swarm optimization all the particles can 
communicate with each other or the communication can 
be restricted between only subsets of particles. The first 
case is found to be more useful for intensive local search 
and the second one for global search. In our implemen- 
tation we let every particle share the information about 
its Pbest with every other particle. 



B. Maximum Velocity 

In order to stop particles leaving the search space we 
need to limit the maximum velocity which particles can 
acquire. This can be done by setting the maximum ve- 
locity along various dimensions. It has been a common 
practice to keep the maximum velocity proportional to 
the search range. 

if V\t) > v max 
, if 7<(t) < -V m 

where V ma x is also a design parameter. We use V max = 
Cv(X max - X mm ) with c v = 0.5 where [X min , X max ] is 
our search range. This means that the biggest jump a 
particle can make is half of the size of the search range. 



C. Initial Conditions 

We assign random positions and velocities to particles 
in the beginning. 




X\t = 0) = X, n 



■ £ X {X r , 



Xmin ) 



(12) 



and 



V l {t = 0) =£V n , 



(13) 

where £ is a uniform random number in the range [0 — 
1]. Apart from the above initial conditions "particles 
on a grid" initial condition can also be used. From our 
trial runs we have found that the final outcome i.e., the 
location of the Gbest is not very sensitive to the initial 
condition. 



D. Boundary condition 

We use the "reflecting wall" boundary condition in 
which a particle reverses its velocity component perpen- 
dicular to the boundary when it tries to cross the bound- 
ary. 



V*(t) = -V\t) 



and 




when X l (t) > X v 
when X l {t) < X n 



(14) 



(15) 
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E. Termination criteria 

PSO particle trajectories are like "chains" in MCMC, 
however, they are coupled to each other by second ac- 
celeration coefficient c 2 . In the limit of c 2 = parti- 
cle do not exchange information but in that case PSO 
will become meaningless. We have used Gelman-Rubin 
R statistics [1^, [27j in order to find out when the explo- 
ration by PSO particles should be stopped. In order to 
use Gelman-Rubin statistics we use the mean of variance 
W within PSO particle trajectories and variance of the 
mean B across PSO trajectories. 

At any time N t the mean value the trajectory of a PSO 
particle is 



Nt 



and the variance 



0? = 



1 Nt 



and the mean of variance 
W 



1 n p 

iv p i=i 



(16) 



(17) 



(18) 



In order to compute variance of means we firstly compute 
mean of means 



A", 



and then 



B 



Nt 



i=i 



(19) 



(20) 



The variance of stationary distribution can be written as 
weighted sum of W and B. 

Z = (l — I W + —B (21) 

V N t ) N t V ' 

The potential scale reduction factor R is given by 

A =VJ (22) 

In general when the value of R as low as 1 we can assume 
that the convergence has been obtained. 



III. COSMOLOGICAL PARAMETER 
ESTIMATION 

Cosmic Microwave Background temperature and po- 
larization anisotropics observed in the sky represent the 



Parameter 


Description 


Value 


w 


Inertia weight 


0.72 


Cl 


Acceleration parameters (personal) 


1.193 


Cl 


Acceleration parameters (social) 


1.193 


C-v Vmax / 


maximum velocity parameter 


0.5 


N v 


Number of particles 


30 


N d 


Search dimensions 


6 



TABLE I: PSO design parameters 



fluctuations in the baryon-photon fluid at the epoch of 
last scattering i.e., when electrons were last scattered by 
photons before they combined with protons and formed 
hydrogen atoms, contain a lot of information about the 
cosmological parameters 0, 128l - l30l ] . Due to the Gaussian 
nature of the density fluctuations at the epoch of last 
scattering (primordial fluctuations) as predicted by infla- 
tionary models, the most important information about 
the cosmological parameters is encoded in the angular 
two point correlation function or power spectrum. 

It is a common practice to represent the temperature 
anisotropies in the CMB sky in spherical harmonic ex- 
pansion 



AT(j 



/] ah, 

III! 



Ximin) 



(23) 



where To is the average temperature i.e., the monopole 
term. The angular two point correlation function is given 
by 



c(0) = ( A 



Tn 



Tn 



E 



21 + 1 

47T 



CiPicosO (24) 



where 9 is angle between directions fi and n' in the sky. 
The angular power spectrum Ci is defined as 



Ci = (ai m a* m ) = (|a iTO | 2 ) . 



(25) 



As mentioned above, the angular power spectrum C; 
(or angular two point correlation function C{6)) depends 
on a large number of cosmological parameters repre- 
senting various energy densities in the universe, primor- 
dial fluctuations, and the physical processes relevant in 
the early universe like reionization and recombination 
[28l [3l1 - [33j | . Many of the cosmological parameters af- 
fect the angular power spectrum in the same way, i.e., 
have degeneracies. However, it is possible to form a set 
of parameters, called "normal parameters" which affect 
the angular power spectrum in a unique way (32l . |3~4| . 
The most common cosmological parameters which have 
been used to fit observational Ci (we also use these) are 
density parameters for cold dark matter (f2 c /i 2 ), baryons 
(QbH 2 ), cosmological constant (^a), primordial scalar 
power spectrum index (n s ), and normalization (A s ), and 
reionization optical depth (r). In our analysis we do not 
consider the "Hubble parameter" ft as a free parameter 
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and compute its value from other parameters for a spa- 
tially fiat model. 

V 1 — "A 

It is now a common practice to follow a " line of sight " 
integration approach for computing the angular power 
spectrum Ci for a set of cosmological parameters which is 
a computationally expensive process. The publicly avail- 
able code CAMB l35l. l36j is based on an earlier code 
named CMBFAST |37| which can compute the angular 
power spectrum Ci on a shared memory platform in a 
short time. 



In the present work we use the likelihood code provided 
by the WMAP team for computing the likelihood func- 
tion, which takes the theoretical angular power spectrum 
computed by CAMB, and the power spectrum estimated 
by the WMAP experiment |4(J, as inputs. 

The exercise of obtaining the best fit cosmological pa- 
rameters involves finding a point in the multidimensional 
parameter space, at which the value of the likelihood 
function C is maximum or — 21og£ is minimum. Apart 
from the best fit values one is also interested in the error 
bars on the estimated parameters which involves know- 
ing the shape of the likelihood function around the best 
fit values, for which we follow a fitting procedure as dis- 
cussed below. 



A. Bayesian Analysis 



B. Likelihood fitting 



In the framework of Bayesian analysis the probability 
of obtaining a set of parameters 9 which is consistent 
with a data set D for a given prior / is given by 



P(Q\D,I) 



p(p\ej)p(e\i) 

P(D\I) 



(27) 



In the above equation P(Q\D,I) is the posterior prob- 
ability distribution, P(D\Q,I) is the likelihood function 
(which will be represented by C) and P(Q\I) is the prior. 
The denominator P(D\I) called "evidence" is used for 
the purpose of normalization and does not depend on 
the parameters O so it can be ignored for the present 
purpose. 

The likelihood function for a CMB experiment with 



N p pixels is given by Q 



£(A|6) = 



(2tt) n p/ 2 \C\ n p/ 2 



cxp 



1 



AC _1 A 



(28) 



where A represents an estimator of the observed data 
vector, having N p entries, and C is the joint covariance 
matrix i.e., sum of the signal and noise covariance matrix 



C = S + N. 



(29) 



The noise covariance matrix N can be approximated 
by a diagonal matrix and the signal covariance matrix S 
is given by @ 



o/ i 1 



-in 



(30) 



Close to the best fit point we can approximate the 
likelihood function Gaussian : 



1 



C = C exp[--A J RA] 



(31) 



where Aj = #i — #i,o where is the maximum likelihood 
value of the parameter 0, and R is the curvature matrix. 
The covariance matrix C which is the inverse of curvature 
matrix R gives an estimate of the standard errors which 
maximum likelihood fitting can give 

The standard error A9i in parameter Qi is given by: 



A6i = ^[Cki = 1/VW' 



(32) 



We fit — 2(log£ — log£o) = A^ff with a paraboloid 
and compute the coefficients of fitting and identify those 
with the elements of curvature matrix. 



IV. RESULTS 

We compute the best fit cosmological parameters from 
the WMAP seven year data for a six parameter model 
with model parameters flbh 2 , Q c h 2 , Q\, n s , A s and r us- 
ing PSO. Before presenting our results quantitatively, we 
consider it useful to present a qualitative comparison of 
the way parameters are estimated in the Markov-Chain 
Monte Carlo methods and in Particle Swarm Optimiza- 
tion. In particular, we want to highlight the way param- 
eter space is explored and sampled in PSO and MCMC 
methods. 



where is the angle between the directions hi and hj 
representing pixel i and pixel j respectively. 

Exact likelihood computation by a brute force method 
is computationally expensive since it involves inversion 
of a N p x Np matrix which is a formidable task for an 
experiment with very large number of pixels. Many ap- 
proximations have been proposed which reduce the cost 
of likelihood computation significantly @, HH, Hil • 



A. Markov Chain and PSO exploration 

The nature of exploration by a Markov chain and that 
by a set of PSO particles is completely different. How- 
ever, there are some similarities also, for example, in both 
the cases the random walk is governed by the optimiza- 
tion function or the fitness function. In MCMC, the pro- 
posal density is directly related to the function to be 
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1 000 



1500 



steps 



FIG. 1: In this figure the red line shows a Markov chain 
which has been obtained from a typical run of COSMOMC 
and the green line shows the trajectory of a PSO particle, 
along the same dimension i.e. Qt,h 2 . Markov chain as well 
as a PSO trajectory can begin anywhere in the range and 
progressively move towards the best fit location. However, in 
the case of PSO the particle approaches towards the best fit 
location (Gbest) in an oscillatory manor with successively de- 
creasing amplitude, which is not the case for a Markov Chain 
since its step size does not vary much. Only after a sufficient 
number of PSO steps the particle positions and the Markov 
chain converge. Since there are more number of points for the 
Markov chain as compared to the PSO, we use x-scale such 
that we have five Markov points for every PSO point. 



0.02 



0.03 

n k2 



0.04 



FIG. 2: In this figure the red and the green points show the 
distribution of the positions of PSO particles and samples 
from a Markov chain respectively, in the same plane. From 
the figure it can be noticed that in the initial stage the scatter 
of PSO particles is very large (see Figure [T] also), however, 
close to the convergence all particles get confined in a very 
compact region. The distribution of the sample points in the 
case of Markov chain is much more symmetric than in PSO. 
We suspect that this is due to the different role played by the 
stochastic variables (random numbers) in PSO as compared 
to that in the Markov chains. The non-symmetric distribution 
makes PSO less favorable if we want to find the shape of the 
likelihood close to the best fit values (in order to report errors) 
in comparison to Markov Chain. 



sampled. In the case of MCMC exploration of a chain 
is completely local, in the sense that whether a step will 
be selected or rejected depends only on the values of the 
fitness function at the current location and the next loca- 
tion. However, in the case of PSO, particles always have 
some information about the global maximum Gbest , 
which keeps changing. In general the step size does not 
change in MCMC, however, in the case of PSO it rapidly 
falls as Gbest approaches close to the global maximum. 

Markov chains sample the fitness function in the 
multi-dimensional parameter space using methods like 
Metropolis Hastings. The sampling algorithm ensures 
that more points are sampled from regions in which the 
fitness function has large values and less points from the 
regions in which it has small values. The values of best 
fit parameters are obtained after marginalization. In case 
of PSO a set of "particles" explore the multidimensional 
space guided by their personal i.e., Pbest and collec- 
tive i.e., Gbest discoveries (see equation (|5|) and equa- 
tion The progress of a chain in MCMC and tra- 
jectory of a particle in PSO are very different. Starting 
from any arbitrary point in the multi-dimensional space 
both approach towards the region where the probabil- 
ity of global maximum is high, however, the way they 
approach is different. 

Figure [1] shows a typical Markov chain and the trajec- 



tory of a PSO particle. Since there were greater num- 
ber of steps in the Markov chain than in PSO, we have 
stretched the x-axis for PSO trajectory by a factor of five, 
i.e, there are five Markov chain points for every PSO 
point for the same range on the x-axis. From the fig- 
ure it can be noticed that the PSO particle reaches the 
global maximum by performing oscillatory motion with 
gradually decreasing amplitude. However, in the case of 
Markov chain the progress is very smooth. 

One of the most common ways to present the results 
of a parameter estimation exercise is to make two dimen- 
sional scatter or contour plots. In MCMC it is done by 
marginalizing the sampled function along all other di- 
mensions apart from the two for which we want scatter 
or contour plots. For a general case, the location of the 
point at which the likelihood function peaks may be dif- 
ferent from the average location computed on the basis 
of the one dimensional probability distribution obtained 
after marginalization. In the case of PSO also, we present 
the average location of the particles, apart from finding 
the point at which the likelihood function peaks. 

The red and green points in Figure [2] show the projec- 
tion of the positions of PSO particles and a set of sample 
points from a Markov chain in a two-dimensional plane 
of the parameter space, respectively. From the figure it 
can be noticed that although the sampled points in both 
cases cluster around the same point (Gbest in PSO) 
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1 10 100 

PSO step 

FIG. 3: The solid line in the above figure shows the change 
in the fitness function — 21og£ as PSO steps progress, and 
the dashed line shows the value for the WMAP seven year 
data. From this figure it can be noticed that in the beginning 
improvement in the value of the fitness function is quite rapid, 
but after some time it saturates, primarily because once the 
particles reach close to the global maximum as given by the 
Gbest their velocities drop. 



the distribution arc completely different. In particular 
the points are more symmetrically distributed around the 
best fit value in the Markov chain case as compared to 
that in PSO. Since PSO results are always quoted in term 
of Gbest therefore the distribution of points does not 
change the results in any way. However, since in the 
present work we make explicit use of the PSO particle 
distribution also (for fitting the likelihood function for 
computing errors), it can create some problem. 



B. Best fit cosmological parameters 

In order to test the working of our PSO module 
we considered a six dimensional cosmological model 
(Vlbh 2 , n c h 2 , Q\, n s ,A s ,T) (see Figures 1 of Q) and tried 
to estimate its parameters from the WMAP seven year 
data. The range over which we tried to optimize our 
fitness function and the results are given in Table fn|) . 

In Figure [3] we show the evolution of the fitness func- 
tion (— 21og£) with PSO steps. We also show value for 
the fitness function for WMAP seven years best fit cos- 
mological parameters (dashed line). From the figure it 
can be noticed that the value obtained by PSO finally 
converges to the WMAP seven year value. Since the ve- 
locity with which particles move toward Gbest is propor- 
tional to their distance from Gbest, we get large jumps 
in the fitness function in the beginning. 

In PSO the values of the best fit parameters, location 
in the parameter space at which the likelihood function 
peaks, is represented by Gbest. For consistency and 
robustness we not only give the location of the Gbest, 



we also give the average location of the PSO particles. 
It is not a surprise that as PSO progresses, the average 
position of PSO particles and the location of the best 
fit point converge to Gbest. In a case when there are 
local maxima also present, some of the PSO particles 
may trap in these, but, the average location of particles 
still follows Gbest. In Figure 2] we show the location of 
Gbest and the average position of the PSO particles in 
our six dimensional search space at different steps. Note 
that in our model h is not a fitting parameter, we get its 
value from the flatness condition (see equation t\'26\i ). 

The black, red and blue lines in Figure [5] show the best 
fit angular power spectrum obtained by MCMC analysis, 
from PSO code and the binned power spectrum (with 
error bars) provided by the WMAP team for the seven 
year data. From the figure it is clear that the power 
spectrum which we obtain from our PSO code closely 
follows other two curves. 



C. Error estimates 

In order to compute error bars on the parameters es- 
timated using PSO, we fit (as discussed in § IIII B[) a six 
dimensional paraboloid to a subset of sampled points to 
Axeff = — 21og£ — (— 21og£ ) where C is the value of 
the likelihood function or Gbestat the last PSO step. 

A X ^ = [6]N[6] T (33) 

where O = (ilfc/i 2 , fi c /i 2 , ^a, n s , A s , r) and [a] is a 6 x 6 
symmetric matrix with 21 independent coefficients. 
The six dimensional vector O is defined as 

e = 9 ~ 9Gbest . (34) 

©Gbest 

We used multi-parameter fitting subroutine of GNU 
scientific library for the fitting j4l| . In order to limit our 
fitting to a subset of points, we consider only those points 
which arc within a six dimensional hyper-sphere of radius 
i.e., \9i\ < 0.1 where Oi is a component of the vector O, 
and A X 2 eS < 10. 

After obtaining the fitting matrix [a] we invert it and 
compute the covariance matrix [C] = [a] -1 and compute 
error for the parameter 9i from it 

A0i = VOl x 0i, Gbes t. (35) 

We present the error in various parameters in the fifth 
column of Table |n|. 

D. Two dimensional contour plots 

Fitting as we have done may not give a very good es- 
timate of the errors on parameters. The exact way to 
figure out how the likelihood surface behaves around the 
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FIG. 4: In each panel of the above figure we show the location of the best fit points (location of Gbest) and the average location 
of PSO particles, by the solid and dot-dashed lines respectively. We also show the best fit values given by the WMAP team 
by dashed lines. From the above figure it can be noticed that as PSO progresses the average location of PSO particles and 
the location of the Gbest converge, which can be used as a robust check. For most cases the best fit values obtained by PSO 
match well with standard LCDM values, but there are some differences also (see table Hi) . 
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Cosmological parameters from PSO 



Variable 


PSO best fit 


WMAP best fit [9] 


Difference (Gbest-M.Lj 




Range 


/-IT_ ■ / _ 'A 1—7 A f i~. rro\ 

Gbest(x cf f = 7469.73) 


Mean 


std dev. 


ML (Xoff = 7486.57) 


Mean 






(0.01,0.04) 


0.022036 


0.022030 


0.000456 


0.02227 


0.02249%To & o b 5 7 


-0.000234(-1.05%) 




(0.01,0.20) 


0.112313 


0.112435 


0.005276 


0.1116 


0.1120 ±0.0056 


0.000713 (0.63%) 


n A 


(0.50,0.75) 


0.721896 


0.720353 


0.029047 


0.729 


„ 797 +u.u:iu 

u - ' z ' -0.029 


-0.007104(-0.97%) 


n a 


(0.50,1.50) 


0.963512 


0.963278 


0.011730 


0.966 


0.967 ±0.014 


-0.002488(-0.25%) 




(1.0,4.0) 


2.448498 


2.454202 


0.106615 


2.42 


2.43 ±0.11 


0.028498(1.17%) 


T 


(0.01,0.11) 


0.08009 


0.083930 


0.012113 


0.0865 


0.088 ±0.015 


-0.00641(-7.41%) 



TABLE II: The first column in the above table shows the PSO fitting parameters and the second, third, fourth and fifth columns 
show the search range, the location of Gbest, the average position of PSO particles and the error (which is computed by fitting 
the sampled function) respectively. In the sixth and seventh columns we give the best fit (ML) and the average values of the 
cosmological parameters derived from WMAP seven years likelihood estimation respectively. In the last column we give the 
difference between our best fit parameters (PSO parameters) and WMAP team's best fit parameters (difference between ML 
and Gbest values). From this table it is clear that roughly there is good agreement between the PSO best fit parameters and 
WMAP team's best fit parameters from the seven year data. 
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FIG. 5: The red, black and blue lines in the above figure 
represent the best fit angular power spectrum recovered from 
PSO, standard LCDM power spectrum and the binned power 
spectrum of WMAP seven year data respectively. Note that 
the PSO best fit angular power spectrum is very close to that 
provided by the WMAP team. The small difference (see ta- 
ble (|II|) between the PSO best fit parameters and the WMAP 
best fit parameters leads to a difference of 7 in XcS (it i s 
smaller by 7 for PSO). 



best fit location, which is given by Gbest in our case 
is to compute the likelihood function on a grid around 
the best fit point. Numerical computation over a multi- 
dimensional grid is quite expensive and even for a mod- 
erate size grid of 24 we have to do 24 6 computations for a 
six parameter cosmological model. In place of consider- 
ing a multi-dimensional grid we consider, iV^iVd — l)/2 
two dimensional grid as is done by some other authors 
[3~4| . We fix the values of the four parameters out of 
six to their best fit values i.e., Gbest and draw the two 
dimensional grid over the other two parameters. 

In Figure [5] and [7] we show two dimensional contour 



plots for different pairs of cosmological parameters. The 
contours from the innermost are for A\ 2 = 2, 6, 8 and 10. 
Note that the range for grids is selected from the rough 
estimates of errors which we get from fitting. The range 
for Q c h 2 and fl\ is taken 2a and for others it is taken 3<r 
where a is the error obtained from the fitting 

E. Primordial Power Spectrum (PPS) with power 
in bins 

In order to demonstrate an interesting example in 
which PSO can be useful, we consider a model in which 
the primordial power spectrum has "binned" power in 
place of being a power law. We consider the power in bins 
as free parameters and that make our model higher di- 
mensional. Apart from having 20 free parameters, which 
gives power in logarithmic bins we consider rest of the 
four parameters n^h 2 ,£l c h 2 ,£Ia and r also free. As is 
expected, a model with more parameters better fits the 
observational data, which in the present case is WMAP 
seven year data i.e., for a primordial power spectrum with 
binned power % 2 is lower by 7 as compared to the stan- 
dard power law mode. The best fit primordial power 
spectrum and angular power spectrum are shown in Fig- 
ure [5] and Figure [5] respectively. Note that the PPS with 
binned power gives better fit, particularly at low 1. We 
consider here the particular form of PPS not to motivate 
any particular theoretical model but just to demonstrate 
the method we use. 



F. Computational performance 

Computing the fitness function i.e., —2 log £, which 
is the most expensive part in the parameter estimation 
procedure, involves two steps. In the first step the C;s 
are computed for a set of cosmological parameters for 
which we use the publicly available code CAMB [36| 
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FIG. 6: Panels in this figure show the two dimensional contour plots for various pairs of the six cosmological parameters 
Q.bh 2 , Q. c h 2 , Q.a, n s , As and r. We have considered a grid of size 24 x 24 for our exercise considered the 3cr region, (where a is 
the error computed from the fitting) around the best fit point. In the case when we do not have any idea about the a we can 
give any other trial value also. The contours in the figure (from inside) are for A\ 2 = 2, 4, 6, 8 and 10. Note that the contour 
plots not only give an idea about the area around the best fit region, they also clearly demonstrate correlation between various 
cosmological parameter. 



which employs OpenMP pragmas for doing computation- 
ally intensive steps in parallel on multi-processor shared 
memory systems. In the second step the likelihood func- 
tion is computed from C;s and the observational data 
i.e., WMAP seven year data, for which we use the likeli- 
hood code provided by the WMAP team. Since our PSO 
code shares two main modules (C; and likelihood com- 
putation) with the publicly available code COSMOMC, 
the difference in the performance is expected only due 
to the number of times the fitness function is computed. 



Computationally a typical PSO run gives very good con- 
vergence with 30 PSO particles with 160 steps i.e., 4800 
computations. We ran a typical COSMOMC run with 
24 chains and found that (from reading .log file for ev- 
ery chain) that CAMB was called 691200 times, which 
is more than 50 times the number of CAMB calls made 
in our case (including the calls for two dimensional grids 
used for contours). 

Here it is also important to mention that parallelizing 
our code is very straightforward. We use OpenMP to 
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compute Ci for a point in the six dimensional parameter 
space and use MPI to distribute particles among differ- 
ent MPI nodes. At every step, particles are distributed 
among MPI nodes and after they return the value of the 
fitness function, the master node computes Pbest and 
Gbest and updates the positions and velocities of par- 
ticles. We have tested our code on a Linux cluster us- 
ing 15 MPI nodes, where each node has 2 AMD quad 
core Opetron 2.6 GHz processors. Within a node we use 
OpenMP for computing the angular power spectrum us- 
ing as many number of threads as the number of cores are 
present. We have also ported our code on a Cray CXI 
system with six nodes, where each node has 2 Intel Xeon 
2.67 Ghz 6 core processors. In a case when the number 
of nodes can divide the number of PSO particles, there is 
no difficulty with load balance. Since at every PSO step 
a very small amount of data has to be communicated 
among MPI nodes, we have found that MPI collective 
communications calls like "broadcast" and "gather" are 
much more efficient than regular "send" and "receive" 
calls. In the present run, the result of which are reported 
here, it took roughly two and half hours for the standard 
PSO run to finish on a Linux cluster with 10 nodes with 
each node having 2 AMD quad core Opetron 2.6 GHz 
processors. The convergence was found just after 159 
steps with 30 PSO particles. 

It is not very straightforward to compare the perfor- 
mance of our PSO code and that of the commonly used 



code COSMOMC mainly because: 

1. The convergence criteria in PSO is slightly different 
than that in COSMOMC. 

2. The angular power spectrum C; computation in 
COSMOMC is optimized by selecting only a subset 
of particles which change their value, i.e., fast-slow 
parameters. There is no such operation in PSO. 

3. It is a common practice to "thin" chains in MCMC 
which means that not all sampled points are used 
for the final result that is not the case in PSO. 

4. Inputs for PSO and COSMOMC are different, in 
the sense COSMOMC needs a a guess covariance 
matrix, widths of the final one dimensional proba- 
bility distribution and a starting point, apart from 
the search range. Which is not the case in PSO in 
which we only need to specify a reasonable search 
range. 

In Q four chains with each having 30,000 points are 
needed for convergence, but in our typical run with six 
parameter model we never need more than 8-9,000 com- 
putations. Here it is important to note that the conver- 
gence in PSO also depend on value of design parameters. 
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FIG. 8: This figure shows how the binned primordial power 
spectrum (20 logarithmic bins over the k range) changes as 
PSO progresses (line 1 is for the initial PPS and 6 is for the 
final PPS). The lower and upper values of the power in bins 
are represented by the dashed line. Starting with a power law 
PPS we found that a power spectrum which has low power 
in some bins and high in others fits better than a power law 
model. 




FIG. 9: The red, black and blue lines in the above figure 
represent the best fit angular power spectrum recovered from 
PSO, standard LCDM power spectrum and the binned power 
spectrum of WMAP seven year data respectively. Note that 
at low I the angular power spectrum with binned PPS fits 
better as compared to the standard power law PPS to the 
observed data (the improvement in A\ 2 is around 7). 



V. DISCUSSION AND CONCLUSION 

In the present work we have demonstrated the appli- 
cation of Particles Swarm Optimization or PSO for cos- 
mological parameter estimation from CMB data, which 
we believe has not been done earlier in any other study. 
Being a different method, PSO can be used as an alter- 
native technique for parameter estimation, particularly 



when one is mainly interested in the location of the best 
fit point. The main focus of our present work was to 
demonstrate the method, in place of producing quanti- 
tative results and comparing those with other methods, 
that we leave for our future work. We have not only 
shown how to compute the values of the best-fit param- 
eters, but have also proposed a method to quantify the 
error bars on the estimated values. 

Based on a very simple algorithm, PSO has many in- 
teresting features some are as follows: 

1. PSO has very few design parameters , the values of 
which can be easily fixed. 

2. By tuning the values of the design parameters, PSO 
can be made more efficient for global or a local 
search although it is more useful for a global search. 

3. In PSO no approximation or extrapolation is made 
at any step (like in artificial neural network) and 
the optimization function is computed exactly at 
every point. 

4. As is claimed in other studies also PSO is very ef- 
ficient in searching for the global maximum when 
dimensionality of the search space is very high or 
there are a large number of local maxima present. 
Adding extra search dimensions in PSO is quite 
straightforward . 

5. PSO can be used for accelerated search of the global 
maximum since it always has some idea about the 
Gbest from the very beginning. 

6. In PSO we need to give only the search range as 
an input and no other information (as is needed in 
MCMC) about parameters, like covariance matrix, 
width of the final 1-d probability distribution or 
starting point is needed. This can be very useful 
for models about the parameters of which we do 
not have much prior information. 

7. Since this method is also based on Bayesian formal- 
ism so extra data sets can be easily incorporated in 
this method also. 

As a result of very high quality CMB data which has al- 
ready been provided by the WMAP satellite [4(| and will 
be provided by the Planck satellite [42[ , it has become an 
interesting exercise to consider much more complex mod- 
els (with very high dimensionality than just six to eleven 
dimensional models). The main motivation behind con- 
sidering such models has been to fit the "outliers" of 
the WMAP data (all years). In one of such exercise, in 
place of considering the primordial power spectrum just 
a power law, power in various bins can be left open for 
the fitting, which makes the dimensionality of the search 
space very large and and we have shown that PSO can 
be quite useful for such problems. 

In the present work we have demonstrated that particle 
swarm optimization can also be used for cosmological 
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parameter estimation from CMB data sets. Apart from 
discussing the technique in detail, we have also presented 
our results for the standard six parameters cosmological 
model. Along with giving the best fit parameters for the 
WMAP seven year data, we have also given some rough 
estimates of the errors, and have shown two dimensional 
contour plots in order to make the treatment complete. 



We have also shown an application of our method for 
a higher dimensional cosmological model in which the 
primordial power spectrum has power in logarithmic bins 
as free parameters. The main aim of the present work was 
to demonstrate a new method and present the results 
qualitatively. In future we plan to present a detailed 
quantitative analysis. 
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